Power-Law Distributions for Paraphrases Extracted from Bilingual Corpora

نویسندگان

  • Spyros Martzoukos
  • Christof Monz
چکیده

We describe a novel method that extracts paraphrases from a bitext, for both the source and target languages. In order to reduce the search space, we decompose the phrase-table into sub-phrase-tables and construct separate clusters for source and target phrases. We convert the clusters into graphs, add smoothing/syntacticinformation-carrier vertices, and compute the similarity between phrases with a random walk-based measure, the commute time. The resulting phrase-paraphrase probabilities are built upon the conversion of the commute times into artificial cooccurrence counts with a novel technique. The co-occurrence count distribution belongs to the power-law family.

منابع مشابه

Paraphrasing with Bilingual Parallel Corpora

Previous work has used monolingual parallel corpora to extract and generate paraphrases. We show that this task can be done using bilingual parallel corpora, a much more commonly available resource. Using alignment techniques from phrasebased statistical machine translation, we show how paraphrases in one language can be identified using a phrase in another language as a pivot. We define a para...

متن کامل

Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation

Previous work has shown that high quality phrasal paraphrases can be extracted from bilingual parallel corpora. However, it is not clear whether bitexts are an appropriate resource for extracting more sophisticated sentential paraphrases, which are more obviously learnable from monolingual parallel corpora. We extend bilingual paraphrase extraction to syntactic paraphrases and demonstrate its a...

متن کامل

Paraphrasing Depending on Bilingual Context Toward Generalization of Translation Knowledge

This study presents a method to automatically acquire paraphrases using bilingual corpora, which utilizes the bilingual dependency relations obtained by projecting a monolingual dependency parse onto the other language sentence based on statistical alignment techniques. Since the paraphrasing method is capable of clearly disambiguating the sense of an original phrase using the bilingual context...

متن کامل

Syntactic Constraints on Paraphrases Extracted from Parallel Corpora

We improve the quality of paraphrases extracted from parallel corpora by requiring that phrases and their paraphrases be the same syntactic type. This is achieved by parsing the English side of a parallel corpus and altering the phrase extraction algorithm to extract phrase labels alongside bilingual phrase pairs. In order to retain broad coverage of non-constituent phrases, complex syntactic l...

متن کامل

Applicability Analysis of Corpus-derived Paraphrases toward Example-based Paraphrasing

Two kinds of paraphrases extracted from a bilingual parallel corpus were analyzed. One is from an adjectival predicate sentence to a non-adjectival one. The other is from a passive form to a non-passive form. The ability to extract paraphrases is strongly desired for paraphrasing studies. Although extracting paraphrases from multi-lingual parallel corpora is possible, the type of paraphrases ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012